28 research outputs found

    The American Commitment to Private International Political Communications: A View of Free Europe, Inc.

    Get PDF
    The principal service of distributed hash tables (DHTs) is route(id, data), which sends data to a peer responsible for id, using typically O(log(# of peers)) overlay hops. Certain applications like peer-to-peer information retrieval generate billions of small messages that are concurrently inserted into a DHT. These applications can generate messages faster than the DHT can process them. To support such demanding applications, a DHT needs a congestion control mechanism to efficiently handle high loads of messages. In this paper we provide an extended study on congestion control for DHTs: we present a theoretical analysis that demonstrates that congestion control for DHTs is absolutely necessary for applications that provide elastic traffic. We then present a new congestion control algorithm for DHTs. We provide extensive live evaluations in a ModelNet cluster and the PlanetLab test bed, which show that our algorithm is nearly loss-free, fair, and provides low lookup times and high throughput under cross-load.QC 20140707</p

    On Routing in Distributed Hash Tables

    Get PDF
    There have been many proposals for constructing routing tables for Distributed Hash Tables (DHT). They can be classified into two groups: A) those that assume that the peers are uniformly randomly distributed in the identifier space, and B) those that allow order-preserving hash functions that lead to a skewed peer distribution in the identifier space. Good solutions for group A have been known for many years. However, DHTs in group A are limited to use randomized hashing and therefore, queries over whole identifier ranges thus do not scale. Group B can handle such queries easily. However, it is more difficult to connect the peers such that the resulting topology provides efficient routing, small routing tables, and balanced routing load. We present an elegant new solution to construct an efficient DHT for group B. Our main idea is to decouple the identifier space from the routing topology. In consequence, our DHT allows arbitrarily skewed peer distributions in the identifier space and does not require the overhead of sampling. Furthermore, the table construction is cheap and does not require active replacement of lost routing entries. To evaluate the performance of routing cost and table construction under high churn, we built an efficient simulator. Using the right data structures, we can easily process the state of over one million peers in RAM

    Scalable Peer-to-Peer Web Retrieval with Highly Discriminative Keys

    Get PDF
    The suitability of peer-to-peer (P2P) approaches for full-text Web retrieval has recently been questioned because of the claimed unacceptable bandwidth consumption induced by retrieval from very large document collections. In this contribution we formalize a novel indexing/retrieval model that achieves high performance, cost-efficient retrieval by indexing with highly discriminative keys (HDKs) stored in a distributed global index maintained in a structured P2P network. HDKs correspond to carefully selected terms and term sets appearing in a small number of collection documents. We provide a theoretical analysis of the scalability of our retrieval model and report experimental results obtained with our HDK-based P2P retrieval engine. These results show that, despite increased indexing costs, the total traffic generated with the HDK approach is significantly smaller than the one obtained with distributed single-term indexing strategies. Furthermore, our experiments show that the retrieval performance obtained with a random set of real queries is comparable to the one of centralized, single-term solution using the best state-of-the-art BM25 relevance computation scheme. Finally, our scalability analysis demonstrates that the HDK approach can scale to large networks of peers indexing Web-size document collections, thus opening the way towards viable, truly-decentralized Web retrieval

    A high-performance distributed hash table for peer-to-peer information retrieval

    No full text
    This thesis describes our research results in the context of peer-to-peer information retrieval (P2P-IR). One goal in P2P-IR is to build a search engine for the World Wide Web (WWW) that runs on up to hundreds of thousands or even millions computers distributed all over the world. The idea is not only to distribute the content, e.g., web pages, but also an index for searching this content. The main focus of this thesis lies on designing an overlay network that is capable of transporting data between the different parts of such a distributed search engine. We built a Distributed Hash Table (DHT) that is able to sustain and efficiently handle high traffic loads, which are typically generated by a distributed IR application. We first analyze the behavior of a state-of-the-art DHT under heavy load and show that a DHT can suffer a so-called "congestion collapse" if it does not have a congestion control mechanism. We propose different ways of integrating congestion control into DHTs to achieve stable behavior of the system under heavy load. We then look into mechanisms for increasing the throughput of a DHT by adapting its routing function to perceived congestion. We propose an algorithm that avoids congested parts of the DHT and thus increases the throughput by exploiting underutilized resources. We evaluate our fully operational DHT prototype using a ModelNet cluster and the PlanetLab testbed to assess the performance of the proposed algorithms. Furthermore, we describe an architecture of a P2P search engine for the WWW. We propose mechanisms to create a highly distributed document index. The main idea is to split the index into very small parts by using so-called highly discriminative keys. We thus achieve an extremely distributed storage of the index, which allows for high parallelism during indexing and querying. We evaluate the performance of our indexing approach with a P2P-IR prototype, which is built on top of our high-performance DHT

    Aggregation of a Term Vocabulary for Peer-to-Peer Information Retrieval: a DHT Stress Test

    Get PDF
    There has been an increasing research interest in developing full-text retrieval based on peer-to-peer (P2P) technology. So far, these research efforts have largely concentrated on efficiently distributing an index. However, ranking of the results retrieved from the index is a crucial part in information retrieval

    Aggregation of a Term Vocabulary for P2P-IR: a DHT Stress Test ⋆

    No full text
    Abstract. There has been an increasing research interest in developing full-text retrieval based on peer-to-peer (P2P) technology. So far, these research efforts have largely concentrated on efficiently distributing an index. However, ranking of the results retrieved from the index is a crucial part in information retrieval. To determine the relevance of a document to a query, ranking algorithms use collection-wide statistics. Term frequency- inverse document frequency (TF-IDF), for example, is based on frequencies of documents containing a given term in the whole collection. Such global frequencies are not readily available in a distributed system. In this paper, we study the feasibility of aggregating global frequencies for a large term vocabulary in a P2P setting. We use a distributed hash table (DHT) for our analysis. Traditional applications of DHTs, such as file sharing, index keys in the order of tens of thousands. Aggregation of a vocabulary consisting of millions of terms poses extreme requirements to a DHT implementation. We study different aggregation strategies and propose optimizations to DHTs to efficiently process large numbers of keys.
    corecore